Search CORE

CiteSeerX

Co-scheduling algorithms for high-throughput workload execution

Author: Aupy Guillaume
Benoit Anne
Raghavan Padma
Robert Yves
Shantharam Manu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

International audienceThis paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several applications concurrently. We partition the original application set into a series of packs, which are executed one by one. A pack comprises several applications, each of them with an assigned number of processors, with the constraint that the total number of processors assigned within a pack does not exceed the maximum number of available processors. The objective is to determine a partition into packs, and an assignment of processors to applications, that minimize the sum of the execution times of the packs. We thoroughly study the complexity of this optimization problem , and propose several heuristics that exhibit very good performance on a variety of workloads, whose application execution times model profiles of parallel scientific codes. We show that co-scheduling leads to faster workload completion time (40% improvement on average over traditional scheduling) and to faster response times (50% improvement). Hence co-scheduling increases system throughput and saves energy, leading to significant benefits from both the user and system perspectives

Co-scheduling algorithms for cache-partitioned systems

Author: Aupy Guillaume
Benoit Anne
Pottier Loïc
Raghavan Padma
Robert Yves
Shantharam Manu
Publication venue: HAL CCSD
Publication date: 08/11/2016
Field of study

Crossref

Co-Scheduling High-Performance Computing Applications

Author: Aupy Guillaume
Benoit Anne
Pottier Loïc
Raghavan Padma
Robert Yves
Shantharam Manu
Publication venue: Chapman and Hall/CRC
Publication date: 01/05/2017
Field of study

International audienc

HAL Descartes

Co-scheduling Amdahl applications on cache-partitioned systems

Author: Aupy Guillaume
Benoit Anne
Dai Sicheng
Pottier Loïc
Raghavan Padma
Robert Yves
Shantharam Manu
Publication venue: HAL CCSD
Publication date: 01/02/2017
Field of study

Cache-partitioned architectures allow subsections of theshared last-level cache (LLC) to be exclusively reserved for someapplications. This technique dramatically limits interactions between applicationsthat are concurrently executing on a multi-core machine.Consider n applications that execute concurrently, with the objective to minimize the makespan,defined as the maximum completion time of the n applications.Key scheduling questions are: (i)which proportionof cache and (ii) how many processors should be given to each application? In this paper, we provide answers to (i) and (ii) for Amdahl applications.Even though the problem is shown to be NP-complete, we give key elements to determinethe subset of applications that should share the LLC(while remaining ones only use their smaller private cache). Building upon these results,we design efficient heuristics for Amdahl applications. Extensive simulations demonstrate the usefulness of co-schedulingwhen our efficient cache partitioning strategies are deployed.Les architectures à partitionnement de cache permettent d'allouer des portions dudernier niveau de cache (LLC) exclusivement réservées à certaines applications. Cette techniquepermet de réduire drastiquement les interactions entre applications qui sont exécutées simultanément sur un machine multi-coeurs. Considérons n applications exécutées simultanémentavec l'objectif de minimiser le makespan, défini comme le maximum des temps de complétionsparmi les n applications. Les problèmes d'ordonnancement sont les suivants: (i) quelle proportionde cache et (ii) combien de processors doivent être alloués à chaque application. Ici, nousassignons des nombres de processeurs rationnels pour chaque application, pour qu'ils puissentêtre partagés parmi les applications grâce au multi-threading. Dans ce travail, nous fournissonsdes réponses aux questions (i) et (ii) pour des applications parfaitement parallèles. Malgré cela,le problème est prouvé être NP-complet, et nous donnons des éléments clés pour déterminer lesous-ensemble des applications qui doivent partager le dernier niveau de cache (tandis que lesautres utilisent seulement leur petit cache privée). Basé sur ces résultats, nous développons desheuristiques efficaces pour des profils d'applications généraux. Un ensemble complet de simulationsdémontre l'utilité de l'ordonnancement concurrent quand les techniques de partitionnementde cache sont mises en plac